25 research outputs found

    Fine-Grained Head Pose Estimation Without Keypoints

    Full text link
    Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to determine pose by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from image intensities through joint binned pose classification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose methods. We open-source our training and testing code as well as release our pre-trained models.Comment: Accepted to Computer Vision and Pattern Recognition Workshops (CVPRW), 2018 IEEE Conference on. IEEE, 201

    Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

    Full text link
    Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision. The task is challenging due to the difficulty of bridging the semantic gap between the visual and natural language domains. This paper addresses the task of automatically generating an alignment between a set of instructions and a first person video demonstrating an activity. The sparse descriptions and ambiguity of written instructions create significant alignment challenges. The key to our approach is the use of egocentric cues to generate a concise set of action proposals, which are then matched to recipe steps using object recognition and computational linguistic techniques. We obtain promising results on both the Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions Dataset

    Modelling seasonal environmental preferences of tropical tuna purse seine fisheries in the Mozambique Channel

    Get PDF
    The spatial-temporal environmental preferences and biomass aggregation of tropical tuna from purse seine fishery in the Mozambique Channel (MZC) have barely been investigated. In this study, tuna biomass volume from Fish Aggregating Devices (FADs) and Free-Swimming Schools (FSC), collected by Spanish fishing logbooks during 2003–2013, were modelled separately as a function of a set of oceanographic variables (sea surface temperature, sea surface height, geostrophic currents, salinity, and chlorophyll-a) using Generalized Additive Models (GAMs). Temporal variables (natural day, month and year), and spatial variables (latitude and longitude) were included in the models to account for the spatio-temporal structure of dynamic biomass of tropical tuna volume gathering. Oceanographic, temporal and spatial effects on aggregated catches differed between fishing modes, even though some common aspects appeared along the area and the period of study. Fishable patches of tuna biomass accumulation were explained by sea surface temperature, productivity, sea surface height, geostrophic currents, and apart from the spatio-temporal variables interactions. Although the models predicted slight differences for tuna fishing spots preferences, both fishing modes partially overlapped. Goodness of fit for selected variables showed that models were able to predict tuna catches assembled patterns in the MZC reasonably well. These results highlight a connection between the biophysical state of the oceans and purse seine tuna catches in the MZC, and ultimately may contribute to the scientific advice for the appropriate management and conservation of the exploited resources by purse seine fleets in the area of MZC.Postprint1,58

    Platypus: Quick, Cheap, and Powerful Refinement of LLMs

    Full text link
    We present Platypus\textbf{Platypus}, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work. In this work we describe (1) our curated dataset Open-Platypus\textbf{Open-Platypus}, that is a subset of other open datasets and which we release to the public\textit{we release to the public} (2) our process of fine-tuning and merging LoRA modules in order to conserve the strong prior of pretrained LLMs, while bringing specific domain knowledge to the surface (3) our efforts in checking for test data leaks and contamination in the training data, which can inform future research. Specifically, the Platypus family achieves strong performance in quantitative LLM metrics across model sizes, topping the global Open LLM leaderboard while using just a fraction of the fine-tuning data and overall compute that are required for other state-of-the-art fine-tuned LLMs. In particular, a 13B Platypus model can be trained on a single\textit{a single} A100 GPU using 25k questions in 5 hours. This is a testament of the quality of our Open-Platypus dataset, and opens opportunities for more improvements in the field. Project page: https://platypus-llm.github.i
    corecore